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Abstract 


We introduce a Bayesian approach to predictive density calibration 
and combination that accounts for parameter uncertainty and model set 
incompleteness through the use of random calibration functionals and random 
combination weights. Building on the work of Ranjan and Gneiting (2010) and 


Gneiting and Ranjan (2013), we use infinite beta mixtures for the calibration. 


The proposed Bayesian nonparametric approach takes advantage of the flexibility 
of Dirichlet process mixtures to achieve any continuous deformation of linearly 
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combined predictive distributions. The inference procedure is based on Gibbs 
sampling and allows accounting for uncertainty in the number of mixture 
components, mixture weights, and calibration parameters. The weak posterior 
consistency of the Bayesian nonparametric calibration is provided under suitable 
conditions for unknown true density. We study the methodology in simulation 
examples with fat tails and multimodal densities and apply it to density forecasts 
of daily S&P returns and daily maximum wind speed at the Frankfurt airport. 

AMS 2000 subject classifications: Primary 62; secondary 91B06. 

JEL codes: C13, C14, C51, C53. 

Keywords: Forecast calibration. Forecast combination. Density forecast, Beta 

mixtures, Bayesian nonparametrics. Slice sampling. 


1 Introduction 


Combining forecasts from different statistical models or other sources of information is 
a crucial problem in many important applications. A wealth of papers have addressed 


this issue with Bates and Granger (1969) being one of the first attempts in this 
field. The initial focus of the literature was on defining and estimating combination 


weights for point forecasts. For instance. Granger and Ramanathan (1984) propose 


to combine point forecasts with unrestricted least squares regression coefficients as 
weights. The ubiquitous Bayesian model averaging technique relies on weighted 
averages of posterior distributions from different models and implies linearly combined 


posterior means (Hoeting et al., 1999). Recently, probabilistic forecasts in the form of 


predictive probability distributions have become prevalent in various fields, including 
macro economics with routine publications of fancharts from central banks, finance 
with asset allocation strategies based on higher-order moments, and meteorology with 


operational ensemble forecasts of future weather (Tay and Wallis, 2000, Gneiting and 


Katzfuss, 2014). 


Therefore, research interest has shifted to the construction of combinations 
of predictive distributions, which poses new challenges. A prominent, critically 
important issue is that predictive distributions ought to satisfy some properties. 
In this paper, we shall focus on the calibration, or reliability, of the predictive 


distributions ( 

Dawid 

1982 1984 

Kling and Bessler 

1989 

Diebold et al. 1998 

Gneiting et al. 

2007; 

Mitchell and Wallis 

2011 

), and build on the widely used beta- 

calibration approach introduced by 

Ranjan and Gneiting 

(2010 

) and [Gneiting and 


Ranjan (2013). Our approach relies on the application of a distortion function to a 
given distribution and develops Bayesian estimation of the distortion function. See 


Artzner et al. (1999), Wang and Young (1998) and Gzyl and Mayoral (2008) for 
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an application of probability distorsion to financial risk measurement and Casarin 


et al. (2015) for an application to joint calibration of predictive densities. It 


should be understood that the scope of our discussion is much wider and that our 
Bayesian inference approach can be extended to other calibration schemes, such as 
the conditional calibration developed in French (1996), West and Crosse (1992), 


West (1992), which is based on a Bayesian conditioning argument. Moreover, we 
will focus on the calibration of linear pool of predictive densities. Note that the 


traditional linear pool (Stone, 1961 Hall and Mitchell 


nonlinear aggregation schemes (Kapetanios et ah, 2015 


2007) has been generalized to 


Gneiting and Ranjan, 2013), 


and to time-varying approaches which account for time instabilities and estimation 


2013). Our contribution can be 


uncertainty in the combination weights (Billio et al. 
extended to these models. 

In this paper, we propose a flexible Bayesian nonparametric approach to 
calibration and combination that relies on beta mixtures, and nests the beta 


transformed linear pool introduced by Ranjan and Gneiting (2010) and Gneiting 


and Ranjan (2013). We develop tools for Bayesian inference for both cases of known 


and unkown number of mixture components. In the case the number of components 
is not known we assume an infinite mixture representation and a Dirichlet process 

This type of prior, and its 


prior (Ferguson, 1973; Lol 1984 


multivariate extensions (e.g.. 


see 


Sethuraman 

1994 

)• 

Miiller et al. (2004 


Griffin and Steel (2006), 


Hatjispyros et al. (2011)), is now widely used due to the availability of efficient 


algorithms for posterior computations (Escobar and We 

St, 1995 

MacEachern and 

Muller, 1998 

Papaspiliopoulos and Roberts 

2008 

Taddj 

^ 2011 

1), including but not 

limited to applications in time series analysis (Hirano 

20021" 

Ghib and Hamilton 

2002[ IRodriguez and ter Horst, 2008| |Taddy and Kottas 

2009 

Jensen and Maheu 

2010t Griffin 

20TTI 

Griffin and Steel 

20TT| 

3urda et al. 

2014; 

Bassetti et al., |2014 

Wiesenfarth et al. 

2014[ Jochmann 

2015 

A recent 

account of Bayesian non- 

parametric inference can be found in 

Hjort et al. 

2010). 

In this paper we develop 


a slice sampling approach that builds on the work of Walker (2007) and Kalli et al. 

dMTI ). 

We provide some conditions under which the proposed probabilistic calibration 
converges in terms of weak posterior consistency to the true underlying density as the 
number of observations goes to infinity. This calibration property is a very powerful 


result, which substantially improves upon the earlier approach of Gneiting and Ranjaif 


(2013), which was shown to be flexibly dispersive only in the sense of second moment 
of the probabilistic forecast. We build on Wu and Ghosal ( 2009a|b ) for the case of 
i.i.d. observations and on Tang and Ghosal (2007) for the Markovian case. In this 
sense, we also contribute to the recent literature on posterior consistency of Bayesian 
nonparametric inference (e.g., see Pelenis (2014) and Norets and Pelenis (2014). 


The remainder of the paper is organized as follows. Section introduces our beta 


3 










































































































































































































mixture calibration and combination model and places it in the context of the general 
density combination approach introduced by Kapetanios et ah (2015). This is followed 


by Section where we propose Bayesian inference based on slice and Gibbs sampling 
methods. Section provides posterior consistency of the Bayesian nonparametric 
calibration and combination in the weak sense under suitable conditions for unknown 
true density and under the assumption of incomplete model set. In Section we 
illustrate the effectiveness of our approach on simulation examples. Section [^provides 
case studies including some well-studied datasets in weather forecast and finance and 
see major improvements in the predictive performance for daily stock returns and 
daily maximum wind speed. The paper closes with a discussion in Section 


2 Beta mixture calibration and combination 


Let Fi,... ,Fm be a set of predictive cumulative distribution functions (cdfs) for a 
real-valued variable of interest, y, which might be based on distinct statistical models 
or experts. Following the forecast combination and calibration literature, we assume 
the cdfs are externally provided. Also, following Ranjan and Gneiting ( |2010 ) and 
Gneiting and Ranjan (2013), we consider combination formulas ; 0 G 0} that 
map the M-tuple {Fi,...,Fm) into a single, aggregated predictive cdf, F{-\6) = 
5'6/(-|Ti, ..., Fm)- Given a sequence of observations, yi,..., yx, the cdf evaluated on 
one observation, e.g. F{yt\9), is referred as probability integral transform (PIT). 


Following Ranjan and Gneiting (2010) and Gneiting and Ranjan (2013), we say that 


the PITs, F{yi\6 ),..., F{yT\9), are well calibrated (or probabilistically calibrated) if 


their distribution is uniform. As noted in Gneiting and Ranjan (2013), well calibration 


is a critical requirement for probabilistic forecasts and checks for the uniformity 
of the probability integral transform have formed a cornerstone of density forecast 
evaluation. 


The aggregation method introduced by Ranjan and Gneiting (2010) and Gneiting 


and Ranjan (2013) considers the beta transformed linear pool 


M 


F{y\0) = ^ UJmFmiy) 


( 1 ) 


^ m=l 


for y G M, where 6 = (a, j3, cj), denotes the cdf of the standard beta distribution 
with parameters a > 0 and /? > 0 and density function ba,i 3 ix), proportional to 

^a-l(i _ 

on the unit interval and uj belong to the the unit simplex in 
Am = = (wi,.. .,u]m) e [0,1]^ : ^ = l| . 

I m=l ) 


4 


























We interpret as a parametric calibration function, which acts on a linear 

combination of Fi,...,Fm with mixture weights u G When a = 1 and 

13 = 1, the calibration function is the identity function, and the beta transformed 
linear pool reduces to the traditional linear pool. In the case M = 1, the beta 
transformation serves to achieve calibration; when M > 1, we seek to combine and 
calibrate simultaneously. The linear combination weights assign relative importance 
to the individual predictive distributions, and the beta transformed linear pool admits 
exchangeable flexible dispersivity in the sense of the second moment of the PITs 
histogram (Gneiting and Ranjan 2013). However, in order to achieve the stronger 


result of probabilistic calibrated PIT the parametric class of calibration functions 
needs to be extended. 

In point of fact, combination formulas Q can be generalised to 

M 

. . . ,Fm) = ^ U}mFm{y)y, E Am,C E c| 

m=l 

where C is a class of distribution functions on [0,1]. Given a single observation 
y with continuous and strictly increasing cdf Fq, then one can write Tb(y) = 
CoiE^=lU:mFm (y)), where Co{z) = Fq{F ^(2;)) and F ^ is the inverse function 
of F = Ylm=i^niFm- This shows that if for any Fq the corresponding Cq belong to 
C, then the PITs of the model are well calibrated. In practice the distribution Fq is 
not known and one is left with the issues of choosing the class C and estimating the 
calibration formula. The choice of C should achieve a compromise between parsimony 
and flexibility. It is well known that any continuous function g on the closed unit 
interval [0,1] can be well approximated by a beta mixture in the sup norm. See, 


e.g.. Theorem 1 in Diaconis and Ylvisaker (1985) or Altomare and Gampiti (1994), 


p. 333. Moreover any continuous density on (0,1) with finite entropy can be well 
approximated with respect to the Kullback-Leibler divergence by a finite mixture 


of beta densities, see Theorem 1 in Robert and Rousseau (2002). For this reason, 
we extend ([^ of Gneiting and Ranjan 


(2013) by introducing an aggregation method 


based on beta mixture combination formulas 

K 


M 


B{y\Q) — Wk Baf.^/3k I ^kmFmiy) 


( 2 ) 


k=l 


^ m=l 


for y E M, where 6 = {w, a, f3, lj), the vector w = {wi ,..., wk) E Ak comprises the 
beta mixture weights, a = (ai,... ,aK) and /3 = (/3i,... ,/3k) are beta calibration 
parameters, and u = (cji,..., ur), with = {tOki, ■ ■ ■ > ^ku) £ Am, k = 1... ,K 
the component-specific sets of linear combination weights. 

When K < 00 we refer to the general beta mixture model in ([^ and (§ as the 
BMx model, which is much more flexible, and has the beta transformed linear pool 
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proposed by Ranjan and Gneiting (2010) and Gneiting and Ranjan (2013) as a special 


case for K = 1. When K is unknown, Bayesian inference can provide guidance in 
choosing appropriate compromises between parsimony and flexibility. In particular, 
our Bayesian approach allows us to treat the parameter K as unbounded and random. 
We refer to this latter setting as the infinite beta mixture or BM^o calibration, for 
which we give details in the following section. 

As a final remark, we note that the beta mixture calibration and combination 
model can also be interpreted in terms of generalized linear pool, introduced by 
Kapetanios et al. (2015). Assume Fi,..., Fm admit Lebesgue densities /i,..., Jm, 


respectively, then the combination formula ([^ can be written equivalently in terms 
of the aggregated probability density function (pdf), namely 


K 


M 


M 


fiy\o) = Yl '^k I 'y ^ ^kmfm{y) 


^c^k,l3k 




(3) 


k=l 

which can be written as 


. m=l 


. m=l 


M 


fiyW) = fra{y) 


m=l 


for y G M, where the generalized weight functions are given by 


K 


M 


^miy} — ’^^^kmWk^OLk, jik ( ^ ^ ^kmFm{y) 


k=l 


, m=l 


for m = 1,... ,M. We should notice that this simple result provides an alternative 
interpretation of our model as a generalized combination model, similarly to the 

However, there are two major differences with 


scheme of 

Kapetanios et al. 

(2015 

respect to 

Kapetanios et al. ( 

2015). 


of the PITs, thus our model should be used each time calibration, and not only 
combination, is needed. Secondly, they use weight functions that are piecewise 
constant, whereas the weight functions implied by the beta mixture model are 
continuous, thus allowing for a smooth combination density. 


3 Bayesian inference 

In Bayesian settings, it is convenient to express the standard beta distribution with 
parameters a > 0 and /3 > 0 in terms of its mean /r = a/{a + f3) and the parameter 


u=a+P>0{ 

Epstein 

1966 

Robert and Rousseau 

2002, fBillio and Gasarin 

2011 

Casarin et al. 2012p. We refer to the reparameterized pdf as 


KJx) = 


Tiu) 


r(/xi/)r((i -^)zz) 




6 


















































where T denotes the gamma function, and we use the symbol -B* j, to denote the 
corresponding cdf. 

We discuss inference in a time series setting at the unit prediction horizon, where 
the training data comprise the predictive cdfs Fit, ■ ■ ■, Fmi, which are conditional on 
information available at time t — 1, along with the respective realization, yt, at time 
t = 1,... ,r, respectively. We then wish to estimate a calibration and combination 
formula of the form ([^ that maps the tuple Fu ,..., Fmi into an aggregated cdf, Ft. 
In practice, we use the estimated calibration and combination formula to aggregate 
the predictive cdfs Fi^t+i, ■ ■ ■, Fm,t+i, which are based on information available at 
time T, into a single predictive cdf, Ft+i, for the subsequent value, yr+i, of the 
variable of interest. Extensions to multi-step ahead forecasts is possible, and we leave 
this for further research. 

To ease the notational burden in the time series setting, let = {uJki, ■ ■ ■, oj^m) £ 
Am, and write 


M 


Ht{yt\uik) = ^ OJkmFmtiVt 


m=l 


and 


M 

E 

m=l 


^ ^ ^km fratiUt) 


(4) 


(5) 


for t = 1,..., T and k = 1,2,..., K, respectively. 


3.1 Bayesian finite beta mixture model 

We work with a reparameterized version of the finite beta mixture calibration and 
combination model (i.e., K < oo), in which the aggregated cdf and pdf can be 
represented as 


K 


Ft{yt\e) = Y,WkB;^^,^iHt{yt\u:k)) 


k=l 


and 


K 


Myt\o) = '^kHyt\^k)bl,^,ut,{Ht{ytM) 


( 6 ) 


(7) 


A:=l 


for t = 1,...,T. The parameter vector for the BMi^- model can then be written 
as 0 = {w,ti,u,Lj), where w = {wi,...,wk) G M = {yi, ■ ■ ■, Hk) G (0,1)^, 
u = {vi, ... ,vk) G (0,oo)^ and u = {uji,... ,uk) G A^, with K being a fixed 
positive integer. The parameter space is defined as 0 = /S.k x (0,1)^ x (0, oo)^ x. 


\K ^ aK 
■ ^M- 
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Our Bayesian approach assumes that 


w 

~ 'Dir{^u,i,... 

(8) 

Afc 


(9) 

I'k 

~ Gai^ui,^u2), 

(10) 


~ ■ ■ ■ ,^ujm) 

(11) 


ioY k = 1,... ,K, where Be{a,j3) is a Beta distribution with density proportional to 
^a-i(l_^)/3-i ^ Qa{'^, (5) is a Gamma distribution with density proportional 

to x'^ exp{—(5x} for x > 0, and Vir{^i ,..., ^m) is a Dirichlet distribution with density 
proportional to {wi,... ,wm) £ ^M, with all these distributions 

being independent. Guided by symmetry arguments in the Beta and Dirichlet case, 
and using a standard, uninformative prior in the Gamma case ( jSpiegelhalter et al. 
2004), we parameterize parsimoniously and set = ••• = = CiJ, 2 , 


= ^u 2 , and = • • • = In what follows, we refer to the common hyper¬ 
parameter values as and respectively 

Adopting a data augmentation framework (Friihwirth-Schnatter, 2006), we 


introduce the allocation variables dkt £ {0,1}, where k = 1,..., AT and t = 1,..., T. 
The likelihood of the BM^^ calibration model is the marginal of the complete data 
likelihood 


T K 


\ ^kt 


L{Y, D I 0) = n n (zcfe ht{yt\uk) 

t=i k=i 

where we let Y = {yi, ..., yx) and D = (dn,..., d^i,..., diT, • • •, (Ikt)- The implied 
joint posterior of D and 6 given the observations Y satisfies 


K 


'K{D,e\Y) (X g{fj.,u,uj) 


w 




k=l 


XI htiyt\^k)t>] 

teCfe 




{Ht{yt\oiik)), 


where is the prior density, = {t = 1,... ,T \ d^t = 1}, and is the 

number of elements in Dk- To sample from the joint posterior, we use a Gibbs sampler 
that draws iteratively from 7r{D \ 9, Y), 7r(/x, i'\w,u, D, Y), Tr{u \ w, fi, i/, D, Y), and 
tt{w I (jj, fi, u, D, Y), respectively, for which we give details in Appendix Sl.l[ 

The output of the algorithm is a sample 0^*^ = (ruW,for i = 
where I is the number of iterations in the Gibbs sampler. The sample is 
used to approximate with FT+iiyr+i) the desired one-step-ahead cumulative posterior 
predictive distribution FT+i{yT+i) = Jq BT+i(yT+il^)'^(^l^)d9, where 7r(0|y) is the 
marginal distribution of 7r(D, 0|T). In the special case when AT = 1 we get 


Tt+i(2/T+i) = ( X] Fm,T+i{yT+i 


M 


( 12 ) 


2 = 1 


\m=l 
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which can be thought of as a Bayesian implementation of the beta transformed 
linear pool 0 of [Ranjan and Gneiting] (|2010[) and [Gneiting and Ranjanj (|2013[) . 


An advantage of the proposed approach based on Gibbs sampling approximation is 
that parameter uncertainty can be taken into consideration in the prediction. A 
plug-in approximation of the predictive, which does not account for the parameter 
uncertainty, can be used, namely FT+i{yT+i) = -^T+i(yT+i|^) where 6 is the 
parameter posterior mean which can be approximated by the empirical average of 
0(*) i = Ij... j /. Another advantage of our approach is that credible intervals for the 
calibrated predictive cdf can be easily approximated by using the output of the Gibbs 
sampler. 


3.2 Bayesian infinite beta mixture model 

In the finite-mixture beta calibration and combination model the number of beta 
densities is given, and model selection procedures can be used to choose the number 
of mixture components. As evidenced in previous studies (see, e.g., Billio et al. (2013) 


and Kapetanios et al. (2015)), in a time series context the model pooling scheme can 


be subject to time instability, thus as a new group of observations arrives the pooling 
scheme can change dramatically. Moreover, Geweke (2010) discusses how Bayesian 


model averaging probabilities and linear opinion pooling weights converge in the limit 
to select one model (or a subset of models) and therefore they might not properly 
cope with such instability. The finite-mixture beta calibration and combination model 
might be subject to similar issues and to solve it we propose to work with an infinite 
prior number of calibration functions and local pooling schemes of which only a finite 
number are selected on a given finite sample. The consequence is that the number K 
of beta mixture components can vary and increase with the sample size, accounting 
for time instability in the pooling. Therefore, the model with infinite calibration 
components provides an answer to the problem of selecting the number of components 
in the finite mixture approach. 

We propose here a Bayesian non-parametric models which allows for estimating 
the number of components and also for including the model uncertainty in the 
posterior predictive. We refer to this model as the infinite-mixture calibration model 
BMaa- Let us assume 


ft{yt\0) = 5 * ,, {Ht{yt\u;)) ht{yt\i^), 

where 6 = (/i, i^, cj), with u = (wi,..., um)- Our prior for the BM^ parameters 6 is 
nonparametric, i.e. 6 ~ G{6) where G is a random probability measure 

G^DP{^|J,Go) 
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and DP{'il)^ Gq) denotes a Dirichlet process (DP) (jFergusoii (jl973)) with concentration 


parameter and base measure Gq- Following the standard result of Sethuraman 
(1994), the Dirichlet process prior can be represented as 

OO 

G{de) = Y,Wk5eM(^) 


k=l 


with random weights Wk generated by the stick-breaking construction 


k-l 

= t’fc ]^(1 - Vl) 
l=l 

where the stick-breaking components vi are i.i.d. random variables from Be(l,ip). 
The atoms 6k are i.i.d. random variables from the base measure Gq. In our model 
the base measure is given by the product of the following distribution 

The Dirichlet process prior assumption and the stick-breaking representation of the 
DP allow us to write the combination and calibration model in terms of infinite 
mixtures of random beta distributions with the following random pdf 


ft{yt\G) = I ft{yt\e)G{de) 




k=l 


The number of components sampled in the first T observations is random and its 


prior distribution is (Antoniak (1974)) 


P{K = fe|V’,T) = 


T\T{g) 
r(V' + T) 


IsTfclV’* 


where sxk is the signed Stirling number (Abramowitz and Stegun 


for fc = 1, 2,.. 

1972, p. 824). The dispersion hyper-parameter ■0 > 0 is driving the prior expected 
number of components. Large values of ■0 increase the probability of introducing 
new components in the mixture. As the prior dispersion depends crucially on this 
parameter, the results of the posterior inference on the infinite mixture model are 
usually presented for different values oltp. It also possible to extend the nonparametric 
models by assuming a further stage of the prior hierarchical structure and assuming 
a prior for ■0. A common choice for the prior is a gamma distribution, Qa{c, d) 
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(see Escobar and West (1995)). The second important feature is that our inference 
approach provides, as a natural product, the posterior distribution of the number 
of components given a sample of data and allows for the inclusion of the number of 
components uncertainty in the predictive density. 

Inference on infinite mixture models resulting from a Dirichlet prior assumption 
requires the use of simulation methods. Gibbs samplers have been proposed in 


Escobar (1994) and Ishwaran and James (2001), which make use of the Polya-urn 


representation of the Dirichlet process. Ishwaran and Zarepour (2000) proposed a 


sampler based on a truncation of the infinite mixture representation. Papaspiliopoulos 


and Roberts (2008) proposed an exact simulation algorithm based on retrospective 


sampling. In this paper we apply the slice sampling algorithm proposed in Walker 


(2007) and Kalli et al. (2011). The algorithm uses a set of auxiliary variables to deal 


with the infiniteness problem of the mixture model. More specifically, let us introduce 
a sequence of slice sampling variables tt*, t = 1, 2,..., T, then ft{yt\G) is the marginal 
of 

OO 

ftiyt,ut\G) = {Ht{yt\uk)) ht{yt\u:k) 

k=l 

Note that given a set of observations, yt and slice variables, ut, t = 1,... ,T, the 
complete data likelihood can be written as 


LiY,U\G) = RE iHt{yt\uk)) htiyt\uJk), 

t=lk&At 

where Y = (yi,..., U = {ui, ... At = {k\ut < Wk}- Note that 

Nt = Gard{At), that is the number of components of the infinite sum, is finite 
when conditioning on the slice variables. Thus, the introduction of the auxiliary 
variables allows us to have a finite mixture representation of the infinite mixture 
model. Eollowing a standard approach to inference for mixture models (e.g., see 


Priihwirth-Schnatter (2006)) we now introduce a sequence of allocation variables, dt, 
t = 1,... ,T, with dt € At- Each of these variables indicates which component of the 
finite mixture provides the observation yt. The complete data likelihood is 


L{Y,U,D\G) = 


t=i 


where D = {di,..., dx). 

Let us denote by E = {vi,V 2 ,...) and 0 = { 6 i,d 2 ,...), with Ok = 
{y-k,k'k,ujk), <^k = {^ik, ■ ■ ■ ,^Mk), the infinite dimensional vectors of the stick¬ 
breaking components and atoms respectively. In what follows we assume the 
dispersion parameter ip is unknown with prior distribution 
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From the completed likelihood function and our assumptions on the prior 
distributions, the joint posterior distribution of U, D, V, 0 and xjj given Y is 

T 

7r{U,D,V,e,ij\Y) ht{yt\^dt) 

t=i 

M 

• n(l “ exp{-^^z^fc/2} 

k>l i=l 


Joint sampling from the posterior is not possible and this calls for the application 
of a Gibbs sampling procedure. Adapting the sampler described in Walker (2007) 
and Kalli et al. (2011) to our setting, we develop an efficient collapsed Gibbs 
sampling procedure which generates sequentially the parameters and the latent 
variables from the full conditional distributions 7r(0|t/, D, V, Y, tp), 7r(V, U\Q, D, Y, ip), 
7 r{D\Q, V, U, Y, Ip) and iT{ip\Y). The details of the steps of the Gibbs sampler are given 
in Appendix |S1.2[ 

The output of the algorithm are samples and 0^*^ = !/(*),a;(*)) for 

i = 1,... ,I where I is the number of MGMC iterations, and can be used to sample 
from the one-step-ahead cumulative predictive distribution. For further details see 


Appendix SI.2 


4 Posterior consistency 


In this section we provide some conditions under which the proposed probabilistic 
calibration formula, BM^o, converges to the true underlying density, implying 
uniformity of the PITs in the limit. The convergence is studied in terms of weak 
posterior consistency when the number of observations in the training set goes to 
infinity. We consider the case the set of combined models is externally provided 
and both combination weights and calibration parameters are estimated. We prove 
consistency of our BM^o calibration for i.i.d. observations and show that posterior 
consistency is still valid under the assumption of Markovian kernels. As commonly 
found in the literature, for non i.i.d. observations, the posterior consistency is case- 
specific depending heavily on the model used. See, for instance |Tang and Ghosal 


(2007) and Ghoudhuri et al. (2004). 


4.1 Consistency results for i.i.d. observations 

Let T be the set of all possible densities (with respect to Lebesgue measure) on 
the sample space T C M and FI* be a prior on T. In order to deal with posterior 
consistency one needs to specify which kind of topology is assumed on T. Since we are 
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interested in weak consistency, we consider the topology of weak convergence, i.e. the 
topology induced by the weak convergence of probability measures. Given a prior IT* 
on T^ the posterior is said to be weakly consistent at /o if U*{U\yi ,..., yn) converges 
a.s. to 1 for every neighbourhood U of /o in the topology of weak convergence, for 
every i.i.d. sequence yi,y 2 , ■ ■ ■ of random variables with common density /q. For more 
details see, e.g.. Chapter 4 in Ghosh and Ramamoorthi] (2003). 

In the i.i.d. case, Schwartz theorem states that weak consistency at a “true 
density” /o holds if the prior assigns positive probabilities to Kullback-Leibler 
neighborhoods of /q. Hence, in this setting, in order to prove week consistency, one 
only needs to check if the Kullback-Leibler property is satisfied by the prior setting 


and the true density /o, see Theorem 4.4.2 in Ghosh and Ramamoorthi (2003) 


It is worth recalling that a Kullback-Leibler neighbourhood of a density f £ J- of 
size £ is defined as 

7 ' 


ICsih) = 




flog[ -j <£ 
G -F, 


and the Kullback-Leibler property holds at /o G T, for short /o G KL{Il*), if 
n7/c7/o)) > 0 for all e > 0. We will denote with supp{fj,) the weak support of 
a probability measure p and with KL{f,g) the Kullback-Leibler divergence between 
the two densities / and g, i.e. KL{f,g) := f flog (^). 

In this section we will exploit the type I mixture prior representation of H*. Let 
us recall that a prior on IF is said to be a type I mixture prior if it is induced via the 
map 

Ge^fG{y)= f K{y;e)G{de), (13) 

Je 

where 0 is the mixing parameter space, K{y, 6) a density kernel on T x 0 and G has 


distribution H on the space A4(0) of probability measures on 0 (see Wu and Ghosal 


(2009a)). In our joint calibration and combination model, the kernel is 


K{y-e) = b*{H{y\u))h{y\u) 


(14) 


with 6 = {Op, 6c ), where Op = u indicates the pooling parameters, and Oc = {p, n) the 
calibration parameters. Since in this subsection we deal with of i.i.d. observations, 
we drop from the kernel K the observation index t. The random mixing distribution 
n is given by a Dirichlet process prior, so that 0\G ~ G where G ~ DP{ip, Gq). For 
the sake of simplicity we assume that the concentration parameter -0 is given. In this 
way, n* turns out to be the prior on F induced by 




b*p^v{H{y\u))h{y\u)G{dudpdv) 


when G ~ DPifj, Go) and h{y\u) = J2m=i ^mfm{y)- 
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Theorem 4.1. Assume that there is a point to* in the interior of Am such that 
h{-\u>*) is continuous and that, for every compact set Cay, iniy^c h{y\u*) > 0. 
Assume also that the true density /o is continuous on y and that 


J [I ^og{H{y\u*))\ + I log(l - H{y\u*))\]fo{y)dy < +oo 
and KL{fQ,h{-\u>*)) < +oo. 


(15) 


If Go has full support, then fo G KLfn*). 


The proof of the previous theorem is given in the Supplementary Materials S3 
The result in Theorem |4.1| is of general validity and the theorem assumptions 


can be easily checked for many applied contexts. In Supplementary Materials S2 


we 

provide two additional results for the case not only the combined models are given, 
but also the combinations weights are fixed, see Theorems S2.1||S2.2 In Theorem 
4.1 we assumed the combination weights co* belong to the interior of Am- This 
assumption is not restrictive and does not necessarily implies neither that the true 
density fo is a mixture of fm nor that fo is one of the models in the combination. 
In the following examples, we provide a clearer explanation of this aspect and show 


how Theorem 4.1 assumptions are satisfied for the Gaussian mixture and Student-t 
mixture examples considered later on in this paper for the simulation study. We 
also considered the two cases the set of combined models includes the true density 
(complete model set) and does not include it (incomplete model set). 

Example 4.1 (Complete and incomplete model set). Consider the case in which 

M 

Kv\^) = X] ^rn^iy\h'rn,crlf) and fo{y) = y:>{y\yo,crl) 

m=l 

where cr^) is the pdf of a normal distribution of mean y and variance ci^. When 

(/io, o'o) ~ ihm, o'm) fai" some m = 1,..., M the model set is complete, otherwise it is 
incomplete. In both cases it turns out that fo G KL{Il*). 


In order to apply Theorem f.l one needs to check that (15) is satisfied for some 


u* in the interior of Am- We shall show that this is true for the equal weights linear 
pooling, co* = (1/M,..., 1/M). To this end, denoting by <I>(-|p,cr^) the cumulative 
distribution function of q:>{-\y,a'^), observe that: 

(i) given a mixture of M normal distributions with means and variances {yrnWm)j 
m = 1,..., M, if 0 < a- < min^ CTm < max^ < cr+, then there are two 
constants C~ and C~^ such that, for every y. 


M 


C V7(y|0,cj^) < '^u}mT{y\hm-,(ylf) < <^+(^(^10, 


a. 




m=l 
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See Lemma S3.1 in Supplementary Materials. 


(a) as y ^ + 00 , one has (1 — ^{y\0,l))/ip{y\0,l) ~ 1 /y and hence | log(l — 
Using (i) and (ii) one can check that 

I log(l -H{y\uj*))\ < Cmaxll log(l - ^>(y|0, cr-^)|, | log(l - ^>(y|0, cr+^)|} < C'y"^ 
for suitable constants C,C'. Hence 

j I log(l - i?(y|cJ*))|/o(y)dy < y‘^ip{y\nQ,al)dy < +oo. 


Analogous considerations hold for | log(-ff(y|cj*))|. Hence the first condition in (15) 
is satisfied. Using (i) and the fact that K L{ip{-\yi, af), o'^)) < +oo, it is easy 

to obtain also that KL{h{-\io*), fo) < +oo. 

Example 4.2 (Mixture of normals). Following the same line of the previous example, 
one can treat the case in which 


M 


K 


Ky\^) = X] crli) and fo{y) = ^ 


m=l 


2=1 


with possibly K > M. Here some of the true parameters i = 1,... K, can 

be equal to some of the parameters of the models {nm, o'm)? rn = 1 ,... M. Also in this 


case it is easy to see that the assumptions of Theorem \4. 1\ are satisfied for any ui* in 
the interior of Am. 

The following example shows that the assumptions can be checked also in the 
cases the set of models is incomplete and the true distribution has heavy tails. 

Example 4.3 (Heavy tails). Consider the case in which 


M 


K 


Ky\a^) = ^ aj^F{y\Um,(y‘L) and fo{y) = 


m=l 


2 = 1 


where Tfj,,a,u is a t-distribution with location, scale and degrees of freedom paramters 
fj.,a and v respectively. Since fo{y) ~ Cy~''~^ as |y| —)■ +oo, arguing as in the 


previous example, it is easy to see that (15) is satisfied whenever v > 2. In this case 

heKL{W). 
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4.2 Consistency results for Markovian observations 

Let T be the set of all possible transition densities (with respect to Lebesgue measure) 
on the sample space 3^ C M. As in the i.i.d. case, we assume that set of combined 
models is given, but the distribution of the current observation yt given all the past 
observations, satisfies the Markovian property: Fmtivt) = Fm{yt\yt-i), where 


{yt\vt-i) = / fm{v\yt-i)dy, 

J {-ao,yt\ 


m = 1, 


t = 1, 


.., M}, subset of 
i{yt\yt-i) and 


for a given set of transition densities {fm = 1 

F. Hence Ht{yt\uj) = H{yt\yt-i,u;) where H{yt\yt-i,uj) = J2m=i^rnFr 
u = (cji,.. 

In order to introduce the definition of weak consistency for Markovian 
observations, let us consider a real valued ergodic Markov process {yo,yi,...) with 
true transition density fo{y\x) belonging to F and stationary distribution vr and 
write for the distribution of the inhnite sequence {yo,yi,. .If H* is a prior 
over F, we shall abbreviate the posterior n*(-|yo,--- ,yn) by n*(-) and by a.s., we 
shall mean a.s. with respect to P^- A sequence of posterior distributions n*(-) 
(n > 1) is said to be weakly consistent at /o if for every weak neighbourhood B 
of /o, it follows that 


0 a.s. as n —7- +oo. Following Tang and Ghosal 


(2007), the weak topology is induced by the sub-base of neighbourhoods (of a point 
/o) AgAfo) :={/:/ f ff(y)f(ylx)dy - f g(y)fo(ylx)dy A(dx) < ej where A is a 
fixed probability distribution and g varies among bounded continuous function on y. 
A standard choice for A is vr. 

We need also to introduce an adequate generalization of the Kullback-Leibler 
property: a Kullback-Leibler neighbourhood of size e of a transition density fo £ F 
with stationary distribution vr is defined as 

^e(/o) := |/ G T": j j foiy\x) log dyn{dx) < e 

and the Kullback-Leibler property holds at /o G F, for short /o G KLiJl*), if 
U*{}Ce{fo)) > 0 for all e > 0. 

Differently from the i.i.d. case, the Kullback-Leibler property alone is not enough 
to ensure weak consistency and needs to be complemented with Corollary 4.1 in |Tang| 
and Ghosal ( 2007| ) (see proof of Theorem |4.2[ ) . 

Recall that we are dealing with a prior H* induced via the map G i— >■ fc{y\x) := 
Jq K{y\x, 0)G{d0), where G has a Dirichlet process prior with base measure Gq and 
K{y\x,e) = b*^ ,^{H{y\x,u))h{y\x,u;) with 6 = {ep^Oc). 
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Theorem 4.2. Assume that the functions /m('l') are continuous ony xy and that, 
for every compact set C C y x y, fm{y\x) > 0. Assume also 

that for every compact set C" C T and every rj G (0,1) there is L > 0 such that 


max sup 



fm{y\x)dy < T]. 


(16) 


fo{y\x) = J b*^ i,{H{y\x,u))h{y\x,io)G*{dfidndu), G* being a probability measure 
on (0,1) X M+ X /S.M, such that supp{G*) C supp{Go) and the marginal distribution 
Go{dpLdv X Am) = Go{dfidndio) has compact support on (0,1) x M+. If 
R-{y\x) < H{y\x,u:){l — H{y\x,u:)) < R^{y\x) and r_(y|x) < h{y\x,u;) for every 
y, X and every u with 


\og{R-{y\x)) 


+ 


log 


R, 


r_ 


+ log(/o( 2 /|x)) fQ{y\x)dy'K{dx) < +oo, (17) 


then fo G KL(n*) and the posterior is weakly consistent at fo- 


The proof of the previous theorem is given in the Supplementary Materials S3 


Example 4.4 (Mixture of autoregressive processes). Consider the case in which the 
models are normal autoregressive processes of the first order, i.e. 


M 

Hyt\yt-i,aj) = u:mi>iyt\ym + 4>myt-i,crli) 

m=l 


where (/?(-|/U, cr^) is the pdf of a normal distribution of mean g, and variance and 
M > 2. Assume that the data are generated from the following dynamic mixture 
model (mAR) 


yt pSAf{p,i + (l)iyt-i,ai,pi) + (1 -p)SM{p 2 + 4'2yt-i,cr2, P 2 ) 


with Pi G {—1,1}, where SM{p,,a, p) denotes a skew-normal (see Azzalini and 
Capitanio 1(2003)) with p, a and p the location, scale and asymmetry parameters. 


Recalling that the density of a skew normal distribution is 2^{y\p,a'^)ip{y\p,a‘^) 
for p = I and 2(1 — ^{y\p,a'^))p{y\p,a‘^) for p = —1, one can write fo{y\x) = 
jb*^,j{H{y\x,u))h{y\x,u)G*{dpdvdu) for a suitable G*. For example if M = 
2>, Pi = 1 and p 2 = —1, then G*{dpdvdu) = pd(i^Qfi){du)52/^^‘i{dpdu) + (1 — 
p)5{QiQ-^{du)5ii‘^ fidpdu). With a little bit of effort, it is possible to show that all 
the assumptions of Theorem \4-^ are satisfied. Details are given in Supplementary 
Materials [ 
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5 Simulation examples 


The consistency results given in the previous section imply uniformity of the PITs 
only in the limit, when the number of observations goes to infinity. On a finite 
sample, different sets of externally provided models can have different forecasting 
performances and one is left with the issue of studying the finite sample properties of 
the probabilistic calibration method. This motivates the following simulation studies. 


5.1 Multimodality and heavy tails 

We assume that a combined predictive distribution can be obtained from the two 
normal predictive distributions with different location and equal scale parameters, 
AA(—1,1) and AA(2,1), where W(/U, cr^) denotes the normal distribution with location 
/i and scale a. Let us denote with and the pdf and cdf 

respectively of a We compare the noncalibrated linear pool (NC) f{y\0) = 

— 1,1) + (1 — Lo)ip{y\2, 1), where 0 = oj and the infinite beta mixture model 
(BMC). The model weights in the linear pooling are estimated by using the recursive 


log score, see e.g. lore et al. (2010). 


In the first set of experiments, we assume that the data are generated by the 
following mixture of the three normal distributions 


yt 


i.i.d. 


0.25) + P2M{0, 0.25) + P3M{2, 0.25), t = 1,..., 1000, 


where p = {pi,P 2 ,P 3 ) £ ^ 3 - The inability of the beta calibration to calibrate this 
liner pooling is documented in the results reported in the Supplementary Materials 
|S4[ In the same Supplementary Materials, a finite beta mixture model is showed 
to outperform the beta calibration model. We apply the BMC model and obtain 
the calibrated PITs given in Fig. To investigate the sensitivity of the posterior 
quantities to the choice of the hyperparameters, we combine and calibrate the cdfs, 
on the same dataset, using two different values of the dispersion parameter, ijj = 1 
and “ip = 5. 

The left charts of Figure report the PITs of the average infinite beta mixture 
calibration (BMC) model and their 99% credibility intervals obtained from 1,000 
MCMC samples after convergence. The PITs of the calibrated model (black lines) 
belong to the credibility interval of the BMC, thus the resulting predictive cdf is well 
calibrated. We should notice that the credibility intervals are usually larger than the 
one obtained using a beta mixture with a fixed number of components. In fact the 
calibrated density accounts for both calibration parameter uncertainty and also for 
the uncertainty about the number of mixture components. A comparison between the 
top- and bottom-right chart also shows that an increase in the value of the dispersion 
parameter usually increases the uncertainty. 
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Figure 1: Infinite beta mixture calibrated (BMC), calibrated (C) and non calibrated 
(NC) combinations for a dataset of 1,000 samples frompiAA(—2,0.25)+p2A/’(0,0.25) + 
P3AA(2,0.25) with p = (1/5,1/5,3/5). PITs cdf (left graph) and calibrated pdf 
(middle graph) of the combination models C (black), NC (red) and BMC (blue) and 
BMC 99% HPD (gray). Prior (black) and posterior (blue) number of components of 
the random BMC model (right graph). 


The credibility intervals (gray lines) obtained with the infinite beta mixture 
calibration model, see Figure always contain the PITs (first column) and the 
predictive density function (second column) of the correct model. The infinite 
BMC seems particularly accurate in the tails (last column). We also note that the 
uncertainty of the number components in the infinite beta mixture implies a wider 
high probability density region (HPD), see gray lines in[^ than that given by the finite 
beta mixture calibration, see third panel in[^ The prior and posterior distributions 
of the number of mixture components in BMC are given in the right graph in Figure 
The posterior density is more concentrated than the prior, suggesting that data 
are informative on the number of calibration components. 

In the second set of experiments, we assume that the data are generated by the 
following mixture of f-distributions, i.e. 

yt • In-l, 1,6) + ^r(2,1,6), f = I,..., 2000, 
where T{p,(J,i') denotes a t-distribution with location, scale and degrees of freedom 
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Figure 2: Infinite beta mixture calibrated (BMC), calibrated (C) and non calibrated 
(NC) combinations for a dataset of 2,000 samples from 1/2T(—1,1, 6) + 1/2T(2,1, 6). 
PITs cdf (left graphs) and calibrated pdf (middle graphs) of the combination models 
C (black), NC (red) and BMC (blue) and BMC 99% HPD (gray). Prior (black) and 
posterior (blue) number of components of the random BMC model (right graph). 


parameters /r, a and u respectively. The results in Supplementary Materials S4 show 
the difficulties of the beta model in achieving well calibrated PITs. In this set of 
experiments we assume V' is unknown. The results of the infinite mixture calibration 
are given in Figure 

Both cdf (first column) and pdf (second column) indicate that the Bayesian BC 
has problems producing well-calibrated predictions. The Bayesian nonparametric 
calibration BMC, on the contrary, produces well-calibrated densities, in particular on 
the tails; see also the 99% credibility intervals. We note that the posterior distribution 
of the number of clusters is more concentrated than the prior, thus there is learning 
from the data on the number of mixture components. Finally, our experiments 
changing the dispersion parameter indicate no substantial changes in the posterior 
density over different hyperparameter values. 


5.2 Dependent observations 

We assume that a predictive density is obtained from the combination of two 
independent normal autoregressive processes of the first order, yt = fJ-i + + eit 

and yt = yi 2 + (pVt-i + £ 2 * with eu ~ Af{0,af) i.i.d., where //i = —1, ^2 = 2 and 
(Ji = a 2 = 0.5. Following the notation used in Example 4.4 we assume the data 
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generating process is a mixture of skew-normal autoregressive processes (mAR) 


yt ~ l:SJ\f{-2 + 4>yt-i,0.5, q) + ‘^SJ\f{2 + g), 


t = 1,..., 1000. We compare the NC and BMC combination schemes defined in the 
previous section. We set the BMC dispersion parameter ^ = 0.1 and consider two 
settings for the autoregressive coefficients: low persistence {4> = 0.5, panel (a) in 
Figure]^ and high persistence (cj) = 0.99, panel (b)). For the skewness parameter, 
we consider one of the cases covered in Example 4.4 i.e. g = —1 (first column in 
panels (a) and (b)). Moreover we show, through simulated examples, that the well- 
calibration property is satisfied also for other two cases: g = —3 and g = —5 (columns 
two and three in panels (a) and (b)). 


6 Empirical applications 


Then, we investigate relative predictability accuracy for the out-of-sample period. 


Precisely, as in Geweke and Amisano ( 

2010 

), 

Geweke and Amisano 

(2011 

) and 

Kapetanios et al. 

(2015 

), we evaluate the predictive densities using the Kullback 


Leibler Information Criterion (KLIC) based measure, utilizing the expected difference 
in the Logarithmic Scores of the candidate forecast densities. The KLIC computes 
the distance between the true density of a random variable and some candidate 
density. Even though the true density is not known, for the comparison of two 
competing models, it is sufficient to consider the average Logarithmic Score (AvLS). 
The continuous ranked probability score (CRPS) at time t for model k is defined as: 

CRPSi,fc = I {Ft,k{z) - dz 

where Ft^kiv) and ft,kiy) are the predictive cdf and pdf, respectively, for model k, 
conditional to the information available up to time t — 1. 

6.1 Stock returns 


The first application considers S&P500 daily percent log returns data from 3 January 
1972 to 31 December 2008, an updated version of the database used in studies such 


ai 

Geweke and Amisano 

(2010 

), Geweke and Amisano 

(2011 

) and Kapetanios et al. 


!015) 

JJ We estimate a Normal GARGH(1,1) model and a t-GARGH(l,l) model 


via maximum likelihood (ML) using rolling samples of 1250 trading days (about five 
years) and produce one day ahead density forecasts. The first one day ahead forecast 


^We thank James Mitchell for providing data. 
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(a) Low persistence {(p = 0.5) 

g = —1 g = —3 g = —5 








(b) High persistence ((p = 0.99) 

g = —1 g = —3 g = —5 








Figure 3: Infinite beta mixture calibrated (BMC), calibrated (C) and non calibrated 
(NC) combinations for a dataset of 1,000 samples from the autoregressive mixture 
model mAR. Top: PITs cdf of the combination models C (black), NC (red) and 
BMC (blue) and BMC 99% HPD (gray). Bottom: prior (black) and posterior (blue) 
number of components of the random BMC model. 


refers to December 15, 1975. The predictive densities are formed by substituting the 
ML estimates for the unknown parameters. We combine the two predictive densities 
using a linear pooling with recursive log score weights, see description in Section 
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Table 1: Average log scores for the Normal GARCH model (Normal), t-GARCH 
model (Student-t), linear pooling (NG) and beta mixture calibration (BMG) over the 
sample period from January 1, 2007 to December 31, 2008. 


Normal 

Student-f 

NC 

BMC 

AvLS -2.311 

-1.650 

-1.827 

-1.450 


5p Also in this section, we refer to it as the non-calibrated model. Furthermore we 
consider our mixture of beta probability density functions (BMC) to achieve better 
calibration properties. We split the sample in two periods. The data from December 
15, 1975 to December 31, 2006 are used for an in-sample calibration of our method 
to investigate its properties over a long period. The data from January 3, 2007 to 
December 31, 2008 for a total of 504 observations, are used for our out-of-sample 
analysis PI Therefore, we extend evidence in Geweke and Amisano (2010) and Geweke 


and Amisano (2011) by focusing on the period related to Great Financial Crisis, with 


the first semester of 2007 considered a tranquil period and the remaining part of the 
sample corresponding to the most turbulent times. In this experiment, we fit the 
calibration over a moving window of 250 days and produce one-day ahead forecasts p] 

First, we compare the two individual models and the two combinations in terms 
of calibration, measured as PIT, over the full sample period (in-sample and out- 
of-sample periods)p] Figure reports calibration results for the in-sample analysis. 
The BMC line is the closer to the 45 degree line, which represents the PIT plot for 
the unknown true/ideal model. This 45 degree line always belongs to the confidence 
interval of the BMC. NC is not calibrated for all quantiles. In particular, on the upper 
and lower tails, the NC differs substantially from the BMC. As in the simulation 
exercises the posterior density for the numbers of beta components in BMC is more 
concentrated than the prior. 

We now turn our attention to the out-of-sample analysis and the log score results. 
Table reports average log scores for the 4 forecasting methods. BMC provides 


^More flexible weighting schemes, such as time-varying weights, can also be computed. 

®In the linear pooling, we use equal weights for the first forecast for the value January 3, 2007; 
then weights are updated summing recursively previous realized log scores of both models. Using 
the forecasts from December 15, 1975 to December 31, 2006 as training sample for log score weights 
reduces drastically forecast accuracy. 

^We also investigated out-of-sample performance over the period from 15 December 1976 to 16 


December 2002, the same sample applied in Geweke and Amisano (20101 and Geweke and Amisano 


(20111. Superior performance of our BMC are confirmed in this longer sample. 

^Figures focusing only on the out-of-sample period provide similar evidence and available upon 
request from the authors. 
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Figure 4: Infinite beta mixture calibrated (BMC), calibrated (C) and non calibrated 
(NC) combinations for the S&PSOO daily percent log returns data. PITs cdf (lines 
from 1 to 3) of the idea model I C (black), combination models NC (red) and BMC 
(blue) and BMC 99% HPD (gray). Prior (black) and posterior (blue) number of 
components of the random BMC model (bottom). 


the highest score. Figure in Supplementary Materials |S5| shows that after the 
initial weeks of January 2007 where models perform similarly, BMC outperforms the 
other three approaches. The t-model provides higher scores than the normal one 
and the non-calibrated combination. The accuracy of the normal Garch model is 
very low during our OOS period, in particular on the extreme events, which results 
in deteriorating NC performance after August 2007, the beginning of the turbulent 
times. Just selecting the t-GARCH version or, even better, applying local weights 
as in our BMG improves accuracy. Figure [TT] in 
density. 


S5 shows the BMC-based predictive 


6.2 Wind speed 


The second empirical example considers the dataset used in Lerch and Thorarinsdottir 
(2013) ^ It consists of 50 ensemble member predictions (Molteni et al. 1996) of wind 
speed at ten meters above the ground, obtained from the global ensemble prediction 
system of the European Gentre for Medium-Range Weather Forecasts (ECMWF). We 
restrict our attention to the ensemble predictions for the maximum wind speed at the 


®We thank Sebastian Lerch and Thordis Thorarinsdottir for providing data and forecasts. 
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Figure 5: Infinite beta mixture calibrated (BMC), calibrated (C) and non calibrated 
(NC) combinations for the maximum wind speed at the station at Frankfurt airport. 
PITs (left top) of the combination models C (black), NC (red) and BMC (blue) and 
BMC 99% HPD (gray). Prior (black) and posterior (blue) number of components of 
the random BMC model (right top). 


station at Frankfurt airport. The station ensemble forecasts are obtained by bilinear 
interpolation of the gridded model output. 

We consider the ECMWF ensemble run initialized at 00 hours UTC with a 
horizontal resolution of about 33 km, a temporal resolution of 3-6 hours and lead 
times of 3, 6 and 24 hours. To obtain predictions of daily maximum wind speed, we 
take the daily maximum of each ensemble member at the Frankfurt location. One 
day ahead forecasts are given by the maximum over lead times. The observations are 
hourly observations of 10-minute average wind speed which is measured over the 10 
minutes before the hour. To obtain daily maximum wind speed, from 1 May 2010 to 
30 of April 2011, we take the maximum over the 24 hours corresponding to the time 
frame of the ensemble forecast. 

The results presented below are based on a verification period from 9 August 2010 
to 30 April 2011, consisting of 263 individual forecast cases. Additionally, we use data 
from 1 February 2010 to 30 April 2011 to obtain training periods of equal length for 
all days in the verification period and for model selection purposes and forecasts from 
May 1, 2010 to 8 August 2010 (100 observations) as initial training period for the 
combination methods. 

Following Lerch and Thorarinsdottir (2013), we consider two competing models: 
the truncated normal distribution (TN) and the generalized extreme value distribution 
(GEV). The TN model is estimated by minimizing the CRPS. The GEV model is 
estimated by maximum likelihood estimation. 

First, we report in-sample results over the sample from May 1, 2010 to 30 April 
2011. Then, we implement an out-of-sample exercise for the period from August 9, 
2010 to 30 April 2011. We report both log score and GRPS results. 

Figure [^reports in-sample calibration results. The BMG line is close to the ideal 
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Table 2: Average log scores (AvLS) and average CRTS (AvCRPS) for the truncated 
normal (TN), the generalized extreme value distribution (GEV), linear pooling (NC) 
and beta mixture calibration (BMC) over the sample period from August 9, 2010 to 
30 April 2011. 



TN 

GEV 

NC 

BMC 

AvLS 

-2.812 

-2.904 

-2.433 

-1.997 

AvCRPS 

1.346 

1.802 

1.314 

0.982 


model and always includes the 45 degree line in the confidence interval. The NC 
performs poorly for small quantiles. The posterior density for the numbers of beta 
components in BMC is more concentrated than the prior, confirming also in this 
exercise that data are informative on the number of mixture components. When 
focusing on the OOS exercise, the BMC predictive distribution predicts accurately 
and provides the highest average LS and the lowest average CRPS in Table Gains 
are substantial, as Figure [To| in Supplementary Materials ^ shows. The distribution 


is often multimodal, see Figure 12 in Supplementary Materials S5, with the highest 
mode at low values of wind speed, and a second mode concentrated around values of 
wind speed greater than 5 meters per second. The truncated normal has too many 
values in the lower and upper tails; the GEV is too skewed to the upper tail, thus 
predicting on average too high values. The NC is also upper biased by the GEV. The 
BMC shifts the probability mass of the predictive distribution from the upper tail to 
the central part and the left tail, thus producing better calibrated forecasts. 


7 Discussion 


We propose a Bayesian approach to predictive density combination and calibration. 


We build on the predictive density calibration framework of Ranjan and Gneiting 
(2010) and Gneiting and Ranjan (2013) and propose infinite beta mixtures as 
prior distributions for the calibration function. We rely upon the flexibility of the 
infinite beta mixtures to achieve a continuous deformation of the linear density 
combination. Each component of the beta mixture calibrates different parts of the 
predictive cdf and uses component-specific combination weights. Thanks to these 
features, our calibration model can also be viewed as a mixture of local combination 
models. Furthermore, our Bayesian framework allows for including various sources of 
uncertainty in the predictive density. 

We provide suitable sufficient conditions for weak posterior consistency of our 
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probabilistic calibration. The results imply uniformity of the PITs in the limit, when 
the number of observations goes to infinity, under both assumptions of i.i.d. and 
Markovian observations. 

We discuss finite sample properties of our methodology in simulation exercises, 
showing how the infinite beta components are adequate in applications with 
multimodal densities, skewness and heavy tails. In empirical applications to stock 
returns and wind speed data, our approach provides well-calibrated and accurate 
density forecasts. 
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Supplementary Materials for: ’’Bayesian Nonparametric 

Calibration and Combination of Predictive Distributions” 
by F. Bassetti, R. Casarin, F. Ravazzolo. 

SI Computational details 

Sl.l Gibbs sampler for the finite beta mixture model 

1. Full conditional distribution of D. Samples from the full conditional of D given 
( 0 , Y) are obtained by drawing sequentially over t, vectors dt = (dit,..., dxt) from 
multinomial distributions with probabilities 

Ti{dkt = 1\0,Y) oc {H{yt\uk)) h{yt\uk) 


loT k = 1,K. 


2. Full conditional distribution o/(/i, i>'). Samples from the full conditional of 
given {w,uj, D,Y) are obtained in a sequence of Metropolis-Hastings (MH) steps on 
a transformed space. Following Bouguila et al. (2006), we let: yk = 1/(1+ exp{— 7 ^}) 
and Vk = exp{Afc}, k = 1,..., K and draw iteratively from 


7 r( 7 fc, Afc|m,u;,F),y) oc 

where = exp{— 7 ^ — Afc}(l + exp{— 7 fc})“^(l + exp{—Afc})“^ is the Jacobian 

of the transform. In the MH we use a Gaussian random walk proposal distribution 
with covariance matrix S = O.O 5 I 2 , which yields acceptance rates of about 0.4. 


3. Full conditional distribution of u. Samples from the full conditional of u given 
(w, fi, u, D, Y) are obtained by drawing iteratively cj^, k = 1,..., K. At each step 
we apply a MH with the prior distribution as proposal. The acceptance probability 
of each MH step is: 

bk,-^kiH^yt\^k))h{yt\uJk) ’ I ’ 

where uj* ~ T>ir{^i ^,..., 


SI 







4-. Full conditional distribution of w. Samples from the full conditional of w given 
(cj, /X, V, D, Y) are obtained by exploiting the conjugacy of the prior distribution, in 
that 

oc Vir{fy, + Ti,... + Tk). 

When K = 1, we replace the single-move Gibbs sampler with a global MH sampler 
with target distribution obtained by applying to the joint posterior 

T 

t=i 

X exp{— 

where Y = (yi,... ,yt), the change of variable /r = 1/(1 -|- exp{—0i}), n = exp{ 02 } 
and uj = 1/(1 -|- exp{— 03 }. We consider a random walk proposal on the transformed 
parameter space accounting for the Jacobian of the transformation, that is, J = 
exp{02 — 0i — 03}(1 + exp{—0i})“^(l + exp{— 03 })“^. Setting the covariance matrix 
to S = diagjO.l, 0.05,0.1}, we achieve acceptance rates of about 0.4. 


SI.2 Gibbs sampler for the infinite beta mixture model 


Let Vf^ = {t = 1,..., T\dt = k} denote the set of indexes of the observations allocated 
to the fc-th component of the mixture and with F = {k\Fk 7 ^ 0 } the set of indexes of 
the non-empty mixture components. Then the cardinality of F, Card(F), gives the 
number of mixture components and D* = sup I? can be interpreted as the number 
of stick-breaking components used in the mixture. As noted by Kalli et al. (2011), 


the sampling of an infinite numbers of 0 and V is not necessary, since only the 
elements in the full conditional pdfs of D are needed. The maximum number of 
atoms and stick-breaking components to sample is N* = maxjt = l,...,T\Nf}, 


_-TV* 

where Nf is the smallest integer such that > 1 — ut- Thus sampling 

from the joint 7r(l/, f7|0, T>, T, ■(/) is obtained by splitting V = (y*,V**), where 
V* = (vi ,..., V£)*) and V** = (u£)*+i,..., ujv*), and by further collapsing the Gibbs, 
that is by sampling from 7r(l/*|0, U, T, ■0) and tt(U\V*,Q, D,Y,^lJ) and then from 
Tr{V**\V*,U,e,D,Y,i}). 

1. Full conditional distribution ofV*. Sampling from the full conditional of V* given 
{D,Q,Y,ip) is obtained by drawing v^, with k < D*, from the full conditionals 


7r(ufc|Zl,F) oc (1 - 


that is, the PDF of a Be{ak + l,bk+'ip) with Ofc = l{dt=fc} and bk = 
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2. Full conditional distribution of U. Samples from the full conditional of U given 

(F, 0, Y, '0) is obtained by simulating from the uniform 

'n{ut\V,D,Y) oc — 

Wdt 

for t = 1,..., T. 

3. Full conditional distribution of V**. Sampling from the full conditional of V** 
given {V*, U, D, 0, Y, tp) is obtained by sampling from 

Ti{vk\U,D,Y)^{l-Vkf-\ 

that is, the PDF of a Be{l, 0), with k = D* + 1,..., N*. 

4- Full conditional distribution of G. Sample the elements k, k = of 0 

given (U, D, V, P, 0), from the full conditional 

7riek\U,D,V,Y) oc l[bl^^,^{Hiyt\LOk))hiyt\u:k) 

t&Vk 

M 

i=l 

for k G F, and from the prior Go for k ^T). We sample from the full conditional by 
iterating over the following steps: 

(a) r:{yk-,Vk\uk,U,D,V,Y,'ip) 

(b) 'K{uk\tik,i^k,U,D,V,Y,'p). 

We apply here the same sampling strategy described for the parameter of the finite 
beta mixture model. 

5. Full conditional distribution of D. Samples from the full conditional of D given 
(V, U, 0, Y, 0) are obtained by sampling from 

Tr{dt\V,U,Y) oc (Hiytl^^dt)) h{yt\aJdt), 

with dt G Nf}, where Nf is defined above. 

6. Full conditional distribution of ip. If the dispersion parameter 0 is assumed to be 
random with Qa{c, d) prior, then an extra step is needed in the Gibbs sampler. More 
specifically, the full conditional distribution of 0 given U, D, V and 0 has density 

7r(0|iF,r) oc S(0,r)0^+‘'“^exp{-d0}l^g(o,+oo), 


S3 



which depends only on the number of observations T and the number of mixture 
components N*, which has been defined above. 

The Gibbs sampler can used to generate draws from the predictive distribution 
FT+i{yT+i)- At each iteration a uniform random variable is sampled from the 
unit interval and 0^*^ is used such that Wj^li < If J > then more 

weights are required than currently exist, and they can be sampled from 
and the additional Oj from Gq. Having taken Oj, Ut+i sampled from 


S2 Calibration consistency 

If the pooling parameters are fixed, say uj = uq, the inference is necessarily limited to 
the calibration parameters 9c = (/r, i'), hence 0 = [0,1] x M'*' and G is a DP process 
on At([0,1] X M+) with base measure Gq and (known) concentration parameter ij}. 

In this special case H* turns out to be the prior induced by 


j b*e^iHiy\u:o))G{dec)h{y\uo) 


when G ~ DP{'ijj, Go). Then, the analogous of Theorem 4.1 is given below. 


Theorem S2.1. Let a;o be a given point in Am such that /i(-|a;o) is continuous and 
for every compact set G C T, 


inf h{y\uo) > 0. 
yec 


(S18) 


Assume that /o is a continuous density on y such that (15) holds with uq in place of 


u 


If Gq has full support, then /o G KL{Il*). 


In the previous theorem, contrary to Theorem 4.1 


uo represents the true 
parameter. We do not assume cjq is in the interior of Am, which means that the 
set of models in the combination scheme can be complete. In some situations, it is 
useful to consider a base measure Go without full support. In this spirit, following 


the techniques of Tang et al. (2007), we can prove the next result. 
Theorem S2.2. Let uq be a given point in Am and let 


fo{y) = uo{H{y\ujo))h{y\uo) 
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with uo{x) = + (1 - wq) /( 04 )xr+ ^]i,u{^)Po{diMiv), Pq being a probability 

measure on (0,1) xM"''. If belong to supp{Go), supp{Po) C supp{Go), and for 

some C > 0 and 0 < p < min(/xo, 1 — /Uq, t'O) wq) one has 

i < +”• 

for A = {pLo + r]){uo + r/) - 1 and B = (1 - /xq + r]){’^o + ^) - 1; then fo G KL{U*). 

We shall notice that the specification of the calibration function uq in the true 
density /o can be used to build a statistical procedure for testing the hypothesis of 
well calibrated PITs against the alternative of not well-calibrated PITs. 


S3 Proofs 

S3.1 Proofs of the results of Sections 14.II and 


The proof of Theorem 4.1 is based on an application of Theorem 1 and Lemma 3 of 


Wu and Ghosal ( 2009a[ |b ). For the shake of clarity we report the statements of these 
results in Theorems IS3.1I below. 


Theorem S3.1 (Theorem 1 and Lemma 3 of Wu and Ghosal (2009a)). Let Q be a 
separable metric spaee. If for any e > 0 there is a probability measure G^ G supp(JT) 
and a elosed set such that 

(HI) /log(/o//Gj/o < e, 

(H2) contains sup(Ge) in its interior and 

fcXv) 


log 


K{y;0) 


< +C)0, 


(H3) infj^gG infeeD^ K{y;9) > 0 for every compact set G <zy, 
(Hf) {6 I—)■ K{y]6) : y G G} is uniformly equicontinuous on D^, 


then fo G KL{U* 


Assumption (HI) corresponds to (Al) in Theorem 1 of Wu and Ghosal (2009a). 


Assumptions (H2)-(H3) correspond to assumptions (A7)-(A8) of Lemma 3 of Wu and 


Ghosal (2009a), while (H5) is slightly different from the original assumption (A9), 


see 


Wu and Ghosal (2009b). In order to apply the results given above, it is useful to 


reformulate Theorem 14.II as follows. 
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Theorem S3.2. Assume that the functions fm{-) are continuous on y. Let uq be a 
continuous density on (0,1) such that 



and 


log(x)| + I log(l — x)\]uo{x)dx < +00 



log(uo(a^))^^o(a^)f^ 2 : < +oo. 


(S20) 


fo{y) = uo{H{y\u>*))h{y\Lj*) with u* in the interior of Am and assume that, for 
every compact set C C y, 

inf h{y\LJ*) > 0. (S21) 

y&C 

Then /o G KLiJl*) whenever Go has full support. 


Proof of Theorem \S3.S\ Here we need to think Am as the set {{oji,... ,ujm-i) G 
[0, uji < 1} endowed with the topology induced by the euclidean norm. 

Clearly, lvm will denote 1 — 

10 . 1 ) 


S3.1. Since 


Verification of HI of Theorem 
f} log(uo(x))uo(x)dx < +00 by Theorem 1 in 


JO 

is 


m 


is continuous 


Robert and Rousseau 


oi l lU. i ) and 
(2002) there 


K, 


Ue{x) = up^{x) = 


Wi 




Xx), 


2 = 1 


where Pf:{dpLdv) := eWi such that KL{uQ,uf) < e. If 

Geidudfidn) := 6ui*idu) x P,,{dpdv), then 


fcAy) = j bl,^AHiy\oa))h{y\cLi)Ge{dujdndn) = Ue{H{y\u;*))h{y\u:*). 
By a simple change of variables, 


KLifoJcJ 


j uo{H{y\uj*))h{y\uj*)log(^ 



uo{H{y\u*))h{y\u*) 

Ue{H{y\u*))h{y\u*) 


dy 


That is 

KL{fojGj = KL{uo,u,)<e. 

Note that supp{Gf) = {^*^*} x and, since Go has full support, 

Ge C supp{Dir{xf,Go)). 

Verification of H2 of Theorem S3.1. One can find a compact set D* in 


(0,1) X (0,+oo) such that D* contains r'i,e)} in its interior. Moreover, 
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recalling that h{y\uj) = u* is in the interior of A^, one can 

find a (sufficiently small) compact set A* C /S.m containing u* in its interior such 
that if cj G A* then Ci^f^h{y\u)*) < h{y\u) < C 2 ,eh{y\u*) for every y. It follows 
that Dg = A* X D* is a compact set containing supp{Ge) in its interior. Noticing 
that if a; G A* then C2^eH{y\u*) > H{y\u) > C\^eH{y\(jJ*) and C' 2 ,e(l — H{y\(jj*)) > 
(1 — H{y\u)) > C'i,e(l — H{y\u}*))^ one can write 


I^{y) := inf K{y,u,p,u) 

{uJ,fl,U)£De 

{cj,li,iy)eD, B{pu, (1 - p)u) 

> C3,My\^*)Hiy\u:Y^-\l - H{y\u:*)f‘-^ =: It{y) 

where C3,. = /B{pu,{l - p)u) : {p,v) G Dl) > 0, A, = 

sup{/uz/ : (/i, z^) G } > 0 and B^ = sup{(l — p)^ : {p, v) ^ D*} > Q. Hence, one the 
one hand fcSv) — ^e{y) and hence log{fG^iy)/Ie{y)) A 0, on the other hand 


log 


fcAy) 

h{y) 


< J log 


< 


>0 


fcM) 

i/o(y)dy 

I*{y) J 

f z 

^3,Ae-l(l _ 


x)dx + |log(C' 3 ,J|. 


Since (74,^(1 “ ^ < Ue{x) < ^(1 “ x)^^ ^ for suitable constants, it 

follows that 


log 


Ue{x) 


^Ag — 1 


(l-x) 


B,-l 


uo{x)dx < (76,e J [I log(x)| + I log(l — x)\]uo{x)dx < +00 


by assumption (S20). Hence 


0 < j log (^T 


fcAy) 


inf(a;,,.,;/)eD, K{y;uj,p,u) 


< + 00 . 


Verification of H3 of Theorem S3.1\ It follows immediately that, for every compact 
set (7, 

inf inf K(?/; a;, /x, z/) > inf/*(?/) 
y&C {LLi,fi,u)£De y&C 

and the right hand side is strictly positive by ( |S21| ). 

Verification of H4 of Theorem S3.1. The function {u,p,iy^y) 1 —)■ K{y;u, p,u) is 
continuous and hence uniformly continuous on the compact set C x D^. It follows 
that the family {{u,p,u) 1 —)• K{y,u>,p,i^) : y £ C} is uniformly equicontinuous on 
D,. □ 
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Proof of Theorem f.l, Write Hq and ho for and its density. By assumptions 

one gets that Hq is continuous and strictly increasing. Hence, if one defines 


uo{x) := 


ho{Ho\x)y 


it follows that foiu) = uo{Ho{y))ho{y)- Note that uq turns out to be a continuous 


function on (0,1). It remains to check that assumption (15) yields (S20). Now, a 
change of variable gives 


J \ ^og{H{y\u:*))\fo{y)dy = J \ log{H{y\u*))\uo{Hoiy))hoiy)dy = J \ log{x)\uo{x)dx. 
Similarly for f \ log{l — H{y\u:*))\]fo{y)dy. Finally 


KL{fo,h{-\uj*)) = / \og{uo{Ho{y))uoiHo{y))ho{y)dy = / uo{x)log{uoix))dx. 


□ 


Proof of Theorem \S2.1 
[SO 


The proof is a simple modification of the proof of Theorem 

□ 


Proof of Theorem \S2.2\ Given any measure Q on (0,1) x M"*“ recall that fqix) = 
uq{P[ {y\u>o))h{y\uQ) where uq{x) = f b*^ j^{x)Q{dyLdiy). Again, by a simple change of 
variables, KL{fG, /o) = KL{ug,uq). Hence to prove that /o G KLili*) it suffices to 
prove that for every e > 0, P{KL{ug,uo) < e} > 0. 

Now recall that if G ~ DP['ip,Go) then G admits the representation G = 
wide^ + (1 — wi)Gi where wi,9i = (^i,i^i) and Gi are stochastically independent, 
Gi ~ DP{'iIj, Go), wi ~ Heta(l, ijii) and 6i ~ Gq. 

Given r],r]' > 0 define := {(tc, y, v) G (0,1)^ x M"*" : |t(; — rcol < ??, |/U — /xo| < 
V, W-M < V} andZ^*^^, := {G = wi6gj^ + {l-wi)Gi : {wi,9i) G Urj, Iugi-up^i < rj'}, 
where we denote by |ui — tt2|i = J \ui — U2\dx he the Li distance between two densities 
ui and U2- 

Note that if G G G* then 

ug{x) > wiby^^,yx) > c^x^’'(l - x)'®” 


where 


Wo -T] 

’ B{{y.o - ??)(z^o - V), (I-ho- r]){i'o - v)) ’ 

A := {ho + f?)(z^o + r])-l, Byj := (1 - /tq + 'n){^o + ??) - 1, 
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provided that — rj,l — hq — rj,VQ — rj^WQ — rj are positive. Hence, for any C > 0; 


ug{x). 


Uq{x)^ 


< _ 


=■■ g*{x) 


for a suitable constant c*^. By assumption (S19), there is C such that Cq := 


-riX- 


f g*(x)uo(x)dx < + 00 . Hence for such (t/, C), by Lemma 7 of 


Ghosal and van der 


Vaart (2007), 


KL{uo,ug) < CidH{uG,uo)[l + meiyi{0,log{dj/{uG,uo)))] 

where dH{uG,uo) = {f{y'uG — is the Hellinger distance between uq and 

ug- Note that the constant Ci depends on Co,r] and C only. Since dH{uG,uo)'^ < 
\ug — uo|i (see, e.g.. Corollary 1.2.1 in Ghosh and Ramamoorthi (2003)) it follows 
that 

KL{uo,ug) < C2\uG - 

for every G when is small enough. Now, it is easy to check that 

\UG - Woll < ‘^\wi - Wol + \uSg^ - USgJl + \UGi - liPoll 

and that |i goes to zero as |0i — 6*o| —>• 0. Since if rj" < rj then U*ii CU*^,, 

using the previous results, for every e > 0, one can find sufficiently small r]' and rj” < rj 
such that if G £ Id*/, the n KL{ug, uq) < e. By standard argument (see e.g. the proof 
of Thm.2 in Tang et al. (2007)) if Gi is in a sufficiently small weak neighbourhood 
Vfji of Pq then |upp — ucji < rj', hence 

{G = wi5e^ + (1 - ^^^l)Gl : Gi G 14,/, {wi,9i) G 7/^//} C C {KL{ug,uo) < e}. 

Moreover, supp{Po) C supp{Go) yields that Pq belongs to the support of Dir{(p,Go) 
and hence P(Gi G 14,/) > 0, while the fact that 9i G supp{Go) yields that 
P{{wi,9i) G Urj") > 0. Using the independence of {wi,9i) and Gi, one concludes 

P{KL{uo,ug) <e)> P{G G > P{Gi G 14,/)P((u;i, 0i) G 7f^//) > 0. 

□ 


S3.2 Proofs of the results of Section 14.21 

As for the proof of the results for Markovian observations, the starting point is a 
suitable adaptation of Theorem 1 and Lemma 3 of Wu and Ghosal (2009a). In 
the Markovian setting, the prior H* on P is induced via the map G i—)• fG{v\x) '■= 
Jq K{y\x, 0)G{d0), where 0 is the mixing parameter space, K{y\x, 6) is a transition 
kernel and G has a prior H on the space At(0) of probability measures on 0. Recall 
that in our model 0 = Am x 0i, where Am and 0i = (0,1) x M"*" are the combination 
and calibration parameter spaces, respectively. 
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Theorem S3.3. Let 0 = Am x 0i, 0i being a separable metric space. If for any 
e > 0 there is a probability measure G suppili) such that 

11'^og{fo{y\x)/fGAy\x))My\x)dy7r{dx) < e, 


(M2) there is a closed set D* which contains sup(Ge) in its interior or = Am x 

with Df: is closed in 0i and sup(G£) C x D° (D° being the interior of D^), 
for which 


log 


fGAy\x) 


infeeD* K{y\x,9) 


x)dy'K{dx) < + 00 , 


(M3) inf( 3 , K{y\x,6) > 0 for every compact set C C y x y , 

(Mf) {6 I—)■ K{y\x,9) : {x,y) G C} is uniformly equicontinuous on D*, 
then fo G KL{U*). 


Proof. The proof follows the same lines of the proofs of Theorem 1 and Lemma 3 of 
Wu and Ghosall (|2009a|). □ 


Theorem S3.4. Under the assumption of Theorem 4-2 then fo G KL{Il*). 

Proof. We apply Theorem S3.3 with 0i = (0,1) x M+, K{y\x,6) = 
b*^^^{H{y\x,Lj))h{y\x,u) with 9 = {Op,6c) and Gc{dudp,dv) := G*{dudiJLdv). In this 
case, since fc^ = fo* = fo, Ml is automatically satisfied, so it remains to check the 
other assumptions. 

Verification of M2 of Theorem S3.^ One can find a compact set in 
(0,1) X (0,+oo) such that the support of Go is contained in Am x Z1°. At this 
stage one can check that 

inf H{y\x, - H{y\x, > H{y\x, u)^{l - H{y\x, u))^ > R-{y\x)^ 


where A := max{sup{/«/ : {fi,^) G He},sup{(l — fa)iy : {pL^u) G Dc}} > 0. Hence, 
setting D* ;= Am x 


I{y\x) 


inf K{y\x,u,y,v)> 


Cir-{y\x)R-{y\x)^ 

R+{y\x) 


r{y\x) 


where Gi = B{y.v, (1 — y)v) : {y,v) G Dc] > 0. Clearly /G'^(ylx) > I{y\x) and 

hence log(/G'£(?/|x)//e(7/|x)) > 0, moreover, recalling that /G^(y|x) = fo{y\x), one can 
write 


S / log (((Hfy) Afelx)* 


< 


^og{fo{y\x))\fo{y\x)dy + / \ log{r{y\x))\fo{y\x)dy. 


SIO 














Combining this with the assumptions, one gets 


0 < 



log 


inf 




fcAvlx) __ 

,i^,u)£D, K{y\x,u,^i,v) 


x)dy'K{dx) < + 00 . 


Verification of M3 of Theorem \S3.3 . 
of the densities hm, it follows immediately that, 


Using the positivity assumption 
for every compact set C, 


inf(x,y)ec > 0. 

Verification of Mf of Theorem S3.3[ The function (cj, /r, u, x, y) i—)• K{y\x, cj, y, u) 
is continuous and hence uniformly continuous on the compact set C x Dg. It follows 
that the family {{u^y,v) i—)• K{y]u, y,i^) : {x,y) G C} is uniformly equicontinuous 
on Df. □ 


Proof of Theorem Let A : {f : f j f g(y)[f(yjx) — fo{y\x)]dy\X{dx) > e} where g 
is a bounded continuous function, say \g{y)\ < 1- Let C C y a compact set such that 
A(C'=) < e/4 and set M = 2A(C). If / G A then j f g(y)lf(ylx)-fo(ylx)]dylA(dx) > 
e/2 and sup^-gc- / g(g)lf(ylx) - fo(ylx)]dy > ejM. Let Xf = argmax{f g(y)lf(ylx) - 
fo{y\x)]dy : x G C}, then if x G C 


g{y)[f{y\x) - fo{y\x)]dy > — - R{f,x) 


where i?(/,x) := f g{y)[f{y\x) - f{y\xf)]dy + f g{y)[fo{y\x) - fo{y\xf)]dy . Now 

for every compact set K C (0,1) x M"*" and every compact set C <Z y the functions 
K X C X C' X Am ^ {y, v, x, y, u) i—)■ b*^ ,^{H{y\x, u)) and C x C' x Am ^ {x, y, u>) i—)• 
h{y\x, u) are uniformly continuous. Hence for every e' there is 5 = 6{e, K, C, C) such 
that 


sup \h*^^^{H{y\x,u)) - h*^^^{H{y\x',u))\ < e', 

(li,i',y,CLi)GKxC'xAM 

sup \h{y\x,u) - h{y\x',u)\ < d 
{y,oj)&C'xAM 

whenever |x - x'\ < 5. Moreover S := sup(^^^^,^_y_^)g^xCxC'xA m < 

+ 00 . Now for / in the support of the prior, recalling that f{y\x) = 
j b*^ ^^{H{y\x,u))h{y\x,u)G{dydudu), for some G with supp{G) C suppiGo) C 
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K X Ajvf, it follows that whenever \x — Xf \ <5 one can write 
9iy)[fiy\x) - f{y\xf)]dy 

K,AH{y\x,u)) - b*^^^{H{y\xf,u))\h{y\x,u;) 


Ic 


< / 

JKxAm JC 

+ %,v{H{y\xf,u))\h{y\x,u) - h{y\xf,u)\ dyG{d^idvdu) 


<e' 


<e' 


IKxAm JC 


h{y\x,u)[l + 


S 


h{y\x,u:\ 


-]dyG{dy,dvdu) 


IkxAm Jc 


h{y\x,u){l + ^)dyG{dy,dvdu) < e'(l + 


where c* = mhijn=i,...,Mva.i(^^y)^cxC' fm{y\x) > 0 by assumption. 
Consider now G' = [—L, L], then by a simple change of variables 


[ f{y\x)dy= [ 
<{C'Y Jr 


KxAm J{0,H{-L\x,oj))U{H{L\x,oj),1) 


b*yy,{z)dzG{dy,di'duj) 


hence by 


'(C'Y 


9 {y)[f{y\x) - f{y\xf)]dy < 2 


'KxAm J (0,r;)U(l-»?,l) 


b*y^y{z)dzG{dyLdvd(jj). 


Now it is easy to see that sup(^ ^ Ab*{z) for a suitable density b* and a 

suitable constant A. Hence, for every e" one can find C" = [—L, L] such that 

\[ 9 {y)[f{y\x)-f{y\xf)]\dy <e" 

J[C'Y 

for every / in the support of H. Combining this statement with the first part of the 
proof it follows that there is 5 such that for every f ^ A which is in the support 
of n one has R{f,x) < e/2M whenever \x — Xf \ < 6. Hence for such / and x one 
has 1/ 9 {y)[f{y\x) — fo{y\x)]dy\ > ej{2M). Thus one can partition C = 
and A = such that the length of Gr is at most 5 and if / G then 

mix&Cr I / 9{y)[f{y\x) - fQ{y\x)]dy\ > e/(2M). Recalling that by Theorem [S^ we 


already know that the Kullback-Leibler property holds, by Corollary 4.1 in Tang 


and Ghosal (2007) it follows that n*(Hr) —)• 0 almost surely and hence n*(H) —)> 0 

□ 


a.s.. 


Lemma S3.1. 7/0 < u- < minm=i,...,M < rnaxmO-m < (T+, then for every y in . 
and Yi,..., ym 


—e 2 ^™(^(y|0,cj7) < Lp{y\Yyn,(ym) < —e 2 ^-(^(y|0,o-^) (S22) 

CT-|_ (T_ 
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with C'^ := maXm=l,...,M(<7m “ ^ •= maXm=l,...,M(<7+ — ^m) ^ 

Proof. It is easy to check that if < cj^ 

Piy\lJ'a,crl) ^ 06 

O-a 

Oa 

with l/fjJ := Ija\ — \la\ and := (y\—[i\,l By using the previous inequalit 

with Ha = 0 , 0-2 = ^2 ^ ^2 ^ ^ ^2 ^ ^2 ^ ^2 ^ 

respectively) after some algebra one gets the first (second, respectively) inequality in 
the thesis. □ 


Ma ^6 f Ma Mi V 


^ "i 


Mi /" (Ja Mi V 


X)m=i<^m¥?(yt|Aim + 4>7nyt-i, (^m) it follows from Lemma 

Cie~^^'^\{y\0,a‘i) < h{y\x,u:) < C 3 e^^^\{y\ 0 ,al), 

for suitable constants Ci, 6 * 2 , Cs, C' 4 . Henc e it is easy to see that [T^ of Theorem |4.2| 
holds true. In order verify 0 of Theoremwe set r {y\x) := Cie ip{y\0,af), 


Details of Example 4.4|. With the notation of Ex ampl e 4.4, if h{yt\yt-i,i^) = 


S3.1 


that 


R+{y\x) := C'|e2'^4"^"4>(y|x,o-^)(l - 4>(y|x, u^)). 


i?_(y|x) := Cy^^^"^y\x,al){l - Hy\x,a^_)). 


and 

Using (ii) of Example |4.1[ after some computations, one can show that 

I ^og{R+{y\x))\ + I log(i?_(y|x))| + | log(r_(y|x))| < K[l + x^ + y^] 

for a suitable constant K. Analogously, using again Lemma |S3.1| and recalling the 
definition of /o, one can prove that 

I log(/o(y|x))| < A:'[1 + x2 + ?/2 ] and fo{y\x) < K"ipo{y\x, cr^) 

for a suitable and K',K”. Hence, to check that (17) in Theorem 4.2 is satisfied it 
suffices to prove that 

J J [1 + x 2 + y‘^]ipo{y\x, al)dy'K{dx) < K J [1 + x‘^]TT{dx) < + 00 . 


Given the specific form of the mAR process we are dealing with, the existence of a 
stationary solution with finte second moment is well-known. 
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S4 Further simulation results 

In order to complete the simulation study, the following combination-calibration 
schemes have been considered in addition to the infinite component beta mixture 
(BMoo). 

• Beta-transformed linear pool (BMi) 

fivW) = fa,0 {H{y\u})) h{y\ui), 

where 6 = {a,j3,uj), h{y\uj) = uj(p{y\ — 1,1) -|- (1 — uj)if{y\2,l) and H{y\u}) = 

- 1 , 1 ) + (1 - 1 ). 

• Two-component finite beta mixture model (BM 2 ) 

f{y\0) = {H{y\uji)) h{y\u!i) + (1 - 'w)/a2,/32 {H{y\oj 2 )) h{y\u! 2 ), 

where 6 = (rc, oi, 02 ,/3i,/32, wi, ^ 2 ), and h{y\uj) and H{y\uj) have been defined 
as in the BC model. 


In the simulation experiments, the hyperparameter setting for the BC and BMC 
model is = 2, = 0.1 and = 1, and = 1, j = 1,2. The priors are 

informative, but with a large prior variance, thus one can expect posterior inference 
should not be affected by the hyperparameter settings. Our experiments show that 
the results, in terms of calibration, do not change when considering less informative 
prior settings, and secondly that the use of improper prior distributions in mixtures 
model, even if possible, still remains an open issue. See e.g. Wasserman (2000) for a 
discussion on the use of improper prior in mixture modelling. 


S4.1 Multimodality 

Figure shows the empirical cdfs of different sequences of probability integral 
transform (PIT). In all the experiments, the PIT of the non-calibrated model (red 
lines) is far from the standard uniform (black lines). In these datasets, the BC clearly 
lacks calibration. The BC cdf (green line) is closer to uniformity than the NC model, 
but it has difficulties in deforming the combination density some parts of the support. 

More specifically, the two-component beta calibrations are able to achieve a 
more flexible deformation of the cdf linear combination providing a calibrated cdf 
(blue and magenta lines) which is close to the uniform cdf. Figure shows the 
results of the calibration and combination procedure decomposed along the different 
components of the mixture. As an example consider the first dataset, generated with 
p = (1/5,1/5,3/5). The solid and dashed blue lines in the top-left plot of Figure]^ 
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p= (1/5,1/5,3/5) p= (1/7,1/7,5/7) 



Figure 6: PITs cdf for different calibration models and different datasets. 


show the contribution of the first and second component respectively of the BMCl 
mixture model to the calibration of the density. The first component mainly calibrates 
the pdf on the positive part of the support and the second component calibrates the 
pdf on the negative part of the support. Both components assign the same weights 
(w = 0.449) to the first model in the pool, i.e. A/(—1,1). This weight is higher than in 
the BC model, which has a less flexible calibration function and thus assigns a lower 
weight oj = 0.202 to the first model in the pool. The solid and dashed magenta lines 
in the top-left plot of Figure show a behaviour similar to the BMCl components. 

The first BMC2 component assigns weight wi = 0.043 to the first model in the 
pool. This means that the calibration on the negative part of the support set is 
done mainly using the predictive distribution of the second model, A/(2,1). The 
calibration of the positive part of the set is obtained thanks to the second BMC2 
component which assigns weight U 2 = 0.667 to the first model in the pool. 
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p= (1/5,1/5,3/5) 


p= (1/7,1/7,5/7) 



J 

—1 
— NC 
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_BMC1^ 

---BMCIj 

_BMC2^ 

---BMC22 




p= (3/5,1/5,1/5) p= (1/7,1/7,5/7) 



Figure 7: Contribution of the calibration components for different models BC, BMCl 
(first and second mixture component, BMCli and BMCI 2 ) and BMC2 (first and 
second mixture component, BMC2i and BMC 22 ), and different datasets. 


S4.2 Heavy Tails 

Fig [^focuses on the right tail of the predictive pdf and shows results for the calibrated 
and beta calibrated PITs cdf and their 99% HPD. There is strong evidence of the 
difficulties of the BC model in calibrating the tails. The BC underestimates the tail 
probability and over-estimates the central part of the distribution. The BMCl and 
BMC2 models are able instead to provide well-calibrated PITs on the tails of the 
distribution. 
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Figure 8: Results of the Bayesian calibration model BC (top), BMCl (middle) and 
BMC2 (bottom) for the right tail of the predictive distribution. In each plot the PITs 
cdf of the calibrated (solid black line) and beta calibrated model (solid coloured line) 
and their 99% HPD region (gray dashed lines) 


S5 Further real data results 



Figure 9: Cumulative log scores for the Normal GARCH model (Normal), t-GARCH 
model (Student-t), linear pooling (NC) and beta mixture calibration (BMC) over the 
sample period from January 1, 2007 to December 31, 2008. 
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Figure 10: Cumulative CRPS for the truncated normal (TN), the the generalized 
extreme value distribution (GEV), linear pooling (NC) and beta mixture calibration 
(BMC) over the sample period from August 9, 2010 to 30 April 2011. 
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Figure 11: Fanchart of the BMC model and observations (red points) over the sample 
period from January 1, 2007 to December 31, 2008. 
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Figure 12: Fanchart of the BMC model and observations (red points) over the sample 
period from August 9, 2010 to 30 April 2011. 
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